Search results for " burrows"
showing 8 items of 8 documents
The colored longest common prefix array computed via sequential scans
2018
Due to the increased availability of large datasets of biological sequences, the tools for sequence comparison are now relying on efficient alignment-free approaches to a greater extent. Most of the alignment-free approaches require the computation of statistics of the sequences in the dataset. Such computations become impractical in internal memory when very large collections of long sequences are considered. In this paper, we present a new conceptual data structure, the colored longest common prefix array (cLCP), that allows to efficiently tackle several problems with an alignment-free approach. In fact, we show that such a data structure can be computed via sequential scans in semi-exter…
The Alternating BWT: an algorithmic perspective
2020
Abstract The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression. It has become a fundamental tool for designing self-indexing data structures, with important applications in several areas in science and engineering. The Alternating Burrows-Wheeler Transform (ABWT) is another transformation recently introduced in Gessel et al. (2012) [21] and studied in the field of Combinatorics on Words. It is analogous to the BWT, except that it uses an alternating lexicographical order instead of the usual one. Building on results in Giancarlo et al. (2018) [23] , where we have shown that BWT and ABWT are part of a larger class of reversible transformations, …
Balancing and clustering of words in the Burrows–Wheeler transform
2011
AbstractCompression algorithms based on Burrows–Wheeler transform (BWT) take advantage of the fact that the word output of BWT shows a local similarity and then turns out to be highly compressible. The aim of the present paper is to study such “clustering effect” by using notions and methods from Combinatorics on Words.The notion of balance of a word plays a central role in our investigation. Empirical observations suggest that balance is actually the combinatorial property of input word that ensure optimal BWT compression. Moreover, it is reasonable to assume that the more balanced the input word is, the more local similarity we have after BWT (and therefore the better the compression is).…
An extension of the Burrows-Wheeler Transform
2007
AbstractWe describe and highlight a generalization of the Burrows–Wheeler Transform (bwt) to a multiset of words. The extended transformation, denoted by ebwt, is reversible. Moreover, it allows to define a bijection between the words over a finite alphabet A and the finite multisets of conjugacy classes of primitive words in A∗. Besides its mathematical interest, the extended transform can be useful for applications in the context of string processing. In the last part of this paper we illustrate one such application, providing a similarity measure between sequences based on ebwt.
Lightweight LCP construction for very large collections of strings
2016
The longest common prefix array is a very advantageous data structure that, combined with the suffix array and the Burrows-Wheeler transform, allows to efficiently compute some combinatorial properties of a string useful in several applications, especially in biological contexts. Nowadays, the input data for many problems are big collections of strings, for instance the data coming from "next-generation" DNA sequencing (NGS) technologies. In this paper we present the first lightweight algorithm (called extLCP) for the simultaneous computation of the longest common prefix array and the Burrows-Wheeler transform of a very large collection of strings having any length. The computation is reali…
SORTING CONJUGATES AND SUFFIXES OF WORDS IN A MULTISET
2014
In this paper we are interested in the study of the combinatorial aspects related to the extension of the Burrows-Wheeler transform to a multiset of words. Such study involves the notion of suffixes and conjugates of words and is based on two different order relations, denoted by <lex and ≺ω, that, even if strictly connected, are quite different from the computational point of view. In particular, we introduce a method that only uses the <lex sorting among suffixes of a multiset of words in order to sort their conjugates according to ≺ω-order. In this study an important role is played by Lyndon words. This strategy could be used in applications specially in the field of Bioinformatic…
Lossless and nearly-lossless image compression based on combinatorial transforms
2011
Common image compression standards are usually based on frequency transform such as Discrete Cosine Transform or Wavelets. We present a different approach for loss-less image compression, it is based on combinatorial transform. The main transform is Burrows Wheeler Transform (BWT) which tends to reorder symbols according to their following context. It becomes a promising compression approach based on contextmodelling. BWT was initially applied for text compression software such as BZIP2 ; nevertheless it has been recently applied to the image compression field. Compression scheme based on Burrows Wheeler Transform is usually lossless ; therefore we imple-ment this algorithm in medical imagi…
Probable root structures and associated trace fossils from the Lower Pleistocene calcarenites of favignana island, southern italy: dilemmas of interp…
2012
Two types of large, branched structures from the Lower Pleistocene (Calabrian) high-energy calcarenites of Favignana Island are described: Faviradixus robustus gen. et sp. nov. and Egadiradixus rectibrachiatus gen. et sp. nov. They may be interpreted as root structures of large plants, trees and trees or shrubs, respectively. The former taxon co-occurs with the marine animal trace fossils Ophiomorpha nodosa , Ophiomorpha isp., Thalassinoides isp. and Beaconites isp. The interpretation as root structures although tentative is probable and can be related to short emergence episodes for the formation of E . rectibrachiatus or to longer emergence, responsible for the discontinuity at the base o…